49 research outputs found

    Computational characterization of tandem repeat and non-globular proteins

    Get PDF
    The first protein structure to be determined was hemoglobin, a globe-like, water-soluble protein with enzymatic activity. Since then, protein science has been biased towards this type, termed globular. However, over the last decades accumulating experimental evidences suggested the functional importance of their counterpart, non-globular proteins (NGPs). The definition includes tandem repetitions, intrinsically disordered regions, aggregating domains and transmembrane domains. NGPs recognition and classification is essential to shed a light on the so called “dark proteome”, i.e. the large fraction that we know almost nothing about. I contributed to this goal through the development of new resources dedicated to NGPs. My main focus are tandem repeat proteins (TRPs). TRPs are characterized by a repeated sequence which folds into a modular architecture, where modules are called “units”. The unit represents not only the structural but also the evolutionary module and base TRPs classification. TRPs are widespread in all type of organisms, where they carry out fundamental functions. The sequences of TRP units diverge quickly while maintaining their fold, hampering detection by traditional methods for sequence analysis. Conversely, the challenges of structure-based repeats detection lie in the multidimensional nature of the data. Specialized methods have been developed for TRPs identification, however few of them annotate single repeat units. RepeatsDB is a database of TRP structures annotated with the position of repeat units and insertions. I contributed to the new version of RepeatsDB database, which was populated taking advantage of ReUPred, predictor of tandem repeat units. The quality of RepeatsDB data is guaranteed by manual validation, a time-consuming task which requires community annotation efforts. To facilitate this process I developed RepeatsDB-lite, web server for the prediction and refinement of tandem repeats in protein structure. Analysing RepeatsDB data, I compared the sequence- and structure-based classification of TRPs. Moreover, I provided insights on TRPs role in the human proteome by characterizing them in terms of function, protein-protein interaction networks and impact on diseases. As a case study, I characterized Collagen V, a repeat protein associated to Ehlers-Danlos syndrome, identifying genotype-phenotype correlations in relation to its interaction network model. Another category of NGPs is intrinsically disordered proteins (IDPs), devoid of order in their native state. Intrinsic disorder was shown to be prevalent in the human proteome, to play important signaling and regulatory roles and to be frequently involved in disease. I contributed to MobiDB, database of protein disorder and mobility annotations that describes several aspects of NGPs structure and mechanism of function. MobiDB provides consensus predictions and functional annotations for all known protein sequences. A common feature of TRPs, IDPs and other NGPs is that they are characterized by low-complexity regions, where the distribution of amino acids deviates from the common amino acid usage. The functional importance of low complexity regions is strictly related to their non-globular arrangement. I contributed to the field with a critical review focusing on the definition of sequence features of low complexity regions and their relationship to structural features. Finally, I exploited the knowledge acquired on NGPs in the previous studies to design one of the first sequence-based methods for the prediction of protein solubility, SODA. SODA uses the aggregation propensity, intrinsic disorder, hydrophobicity and secondary structure preferences from a sequence to evaluate solubility changes introduced by a mutation. The main envisaged applications of SODA are in protein engineering and in the study of the impact of protein mutations in disease insurgence

    MobiDB: Intrinsically disordered proteins in 2021

    Get PDF
    The MobiDB database (URL: https://mobidb.org/) provides predictions and annotations for intrinsically disordered proteins. Here, we report recent developments implemented in MobiDB version 4, regarding the database format, with novel types of annotations and an improved update process. The new website includes a re-designed user interface, a more effective search engine and advanced API for programmatic access. The new database schema gives more flexibility for the users, as well as simplifying the maintenance and updates. In addition, the new entry page provides more visualisation tools including customizable feature viewer and graphs of the residue contact maps. MobiDB v4 annotates the binding modes of disordered proteins, whether they undergo disorder-to-order transitions or remain disordered in the bound state. In addition, disordered regions undergoing liquid-liquid phase separation or post-translational modifications are defined. The integrated information is presented in a simplified interface, which enables faster searches and allows large customized datasets to be downloaded in TSV, Fasta or JSON formats. An alternative advanced interface allows users to drill deeper into features of interest. A new statistics page provides information at database and proteome levels. The new MobiDB version presents state-of-the-art knowledge on disordered proteins and improves data accessibility for both computational and experimental users.Fil: Piovesan, Damiano. Università di Padova; ItaliaFil: Necci, Marco. Università di Padova; ItaliaFil: Escobedo, Nahuel Abel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: Monzon, Alexander Miguel. Università di Padova; ItaliaFil: Viczián, András. Università di Padova; ItaliaFil: Mičetić, Ivan. Università di Padova; ItaliaFil: Quaglia, Federica. Università di Padova; ItaliaFil: Paladin, Lisanna. Università di Padova; ItaliaFil: Ramasamy, Pathmanaban. Vrije Unviversiteit Brussel; Bélgica. University of Ghent; Bélgica. Interuniversity Institute of Bioinformatics in Brussels; BélgicaFil: Dosztányi, Zsuzsanna. Eötvös Loránd University; HungríaFil: Vranken, Wim F.. Vrije Unviversiteit Brussel; Bélgica. Interuniversity Institute of Bioinformatics in Brussels; BélgicaFil: Davey, Norman E.. The Institute Of Cancer Research; Reino UnidoFil: Parisi, Gustavo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: Fuxreiter, Monika. Università di Padova; ItaliaFil: Tosatto, Silvio C. E.. Università di Padova; Itali

    MobiDB 3.0: More annotations for intrinsic disorder, conformational diversity and interactions in proteins

    Get PDF
    The MobiDB (URL: mobidb.bio.unipd.it) database of protein disorder and mobility annotations has been significantly updated and upgraded since its last major renewal in 2014. Several curated datasets for intrinsic disorder and folding upon binding have been integrated from specialized databases. The indirect evidence has also been expanded to better capture information available in the PDB, such as high temperature residues in X-ray structures and overall conformational diversity. Novel nuclear magnetic resonance chemical shift data provides an additional experimental information layer on conformational dynamics. Predictions have been expanded to provide new types of annotation on backbone rigidity, secondary structure preference and disordered binding regions. MobiDB 3.0 contains information for the complete UniProt protein set and synchronization has been improved by covering all UniParc sequences. An advanced search function allows the creation of a wide array of custom-made datasets for download and further analysis. A large amount of information and cross-links to more specialized databases are intended to make MobiDB the central resource for the scientific community working on protein intrinsic disorder and mobility.Fil: Piovesan, Damiano. Università di Padova; ItaliaFil: Tabaro, Francesco. Università di Padova; ItaliaFil: Paladin, Lisanna. Università di Padova; ItaliaFil: Necci, Marco. Università di Padova; Italia. Instituto Agrario San Michele all'Adige Fondazione Edmund Mach; ItaliaFil: Micetić, Ivan. Università di Padova; ItaliaFil: Camilloni, Carlo. Università degli Studi di Milano; ItaliaFil: Davey, Norman. Universidad de Dublin; IrlandaFil: Dosztányi, Zsuzsanna. Eötvös Loránd University; HungríaFil: Mészáros, Bálint. Eötvös Loránd University; HungríaFil: Monzón, Alexander. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Parisi, Gustavo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: Schad, Eva. Hungarian Academy Of Sciences; HungríaFil: Sormanni, Pietro. University of Cambridge; Reino UnidoFil: Tompa, Peter. Vrije Unviversiteit Brussel; BélgicaFil: Vendruscolo, Michele. University of Cambridge; Reino UnidoFil: Vranken, Wim F.. Vrije Unviversiteit Brussel; BélgicaFil: Tosatto, Silvio C. E.. Università di Padova; Itali

    RepeatsDB in 2021: Improved data and extended classification for protein tandem repeat structures

    Get PDF
    The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.Fil: Paladin, Lisanna. Università di Padova; ItaliaFil: Bevilacqua, Martina. Università di Padova; ItaliaFil: Errigo, Sara. Università di Padova; ItaliaFil: Piovesan, Damiano. Università di Padova; ItaliaFil: Mičetić, Ivan. Università di Padova; ItaliaFil: Necci, Marco. Università di Padova; ItaliaFil: Monzon, Alexander Miguel. Università di Padova; ItaliaFil: Fabre, Maria Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Biotecnología y Biología Molecular. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Instituto de Biotecnología y Biología Molecular; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Departamento de Ciencias Biológicas; ArgentinaFil: López, José Luis. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Biotecnología y Biología Molecular. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Instituto de Biotecnología y Biología Molecular; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Departamento de Ciencias Biológicas; ArgentinaFil: Nilsson, Juliet Fernanda. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Biotecnología y Biología Molecular. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Instituto de Biotecnología y Biología Molecular; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Departamento de Ciencias Biológicas; ArgentinaFil: Ríos, Javier Sebastián. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: Lorenzano Menna, Pablo. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: Cabrera, Maia Diana Eliana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: González Buitrón, Martín. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: Gonçalves Kulik, Mariane. Johannes Gutenberg Universitat Mainz; AlemaniaFil: Fernández Alberti, Sebastián. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: Fornasari, Maria Silvina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: Parisi, Gustavo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: Lagares, Antonio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Biotecnología y Biología Molecular. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Instituto de Biotecnología y Biología Molecular; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Agrarias y Forestales. Departamento de Ciencias Biológicas; ArgentinaFil: Hirsh, Layla. Pontificia Universidad Católica de Perú; PerúFil: Andrade Navarro, Miguel A.. Johannes Gutenberg Universitat Mainz; AlemaniaFil: Kajava, Andrey V. Centre National de la Recherche Scientifique; FranciaFil: Tosatto, Silvio C E. Università di Padova; Itali

    Global network of computational biology communities: ISCB's regional student groups breaking barriers [version 1; peer review: Not peer reviewed]

    Get PDF
    Regional Student Groups (RSGs) of the International Society for Computational Biology Student Council (ISCB-SC) have been instrumental to connect computational biologists globally and to create more awareness about bioinformatics education. This article highlights the initiatives carried out by the RSGs both nationally and internationally to strengthen the present and future of the bioinformatics community. Moreover, we discuss the future directions the organization will take and the challenges to advance further in the ISCB-SC main mission: “Nurture the new generation of computational biologists”.Fil: Shome, Sayane. University of Iowa; Estados UnidosFil: Parra, Rodrigo Gonzalo. European Molecular Biology Laboratory; Alemania. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Fatima, Nazeefa. Uppsala Universitet; SueciaFil: Monzon, Alexander Miguel. Università di Padova; ItaliaFil: Cuypers, Bart. Universiteit Antwerp; BélgicaFil: Moosa, Yumna. University of KwaZulu Natal; SudáfricaFil: Da Rocha Coimbra, Nilson. Universidade Federal de Minas Gerais; BrasilFil: Assis, Juliana. Universidade Federal de Minas Gerais; BrasilFil: Giner Delgado, Carla. Universitat Autònoma de Barcelona; EspañaFil: Dönertaş, Handan Melike. European Molecular Biology Laboratory. European Bioinformatics Institute; Reino UnidoFil: Cuesta Astroz, Yesid. Universidad de Antioquia; Colombia. Universidad Ces. Facultad de Medicina.; ColombiaFil: Saarunya, Geetha. University of South Carolina; Estados UnidosFil: Allali, Imane. Universite Mohammed V. Rabat; Otros paises de África. University of Cape Town; SudáfricaFil: Gupta, Shruti. Jawaharlal Nehru University; IndiaFil: Srivastava, Ambuj. Indian Institute of Technology Madras; IndiaFil: Kalsan, Manisha. Jawaharlal Nehru University; IndiaFil: Valdivia, Catalina. Universidad Andrés Bello; ChileFil: Olguín Orellana, Gabriel José. Universidad de Talca; ChileFil: Papadimitriou, Sofia. Vrije Unviversiteit Brussel; Bélgica. Université Libre de Bruxelles; BélgicaFil: Parisi, Daniele. Katholikie Universiteit Leuven; BélgicaFil: Kristensen, Nikolaj Pagh. Technical University of Denmark; DinamarcaFil: Rib, Leonor. Universidad de Copenhagen; DinamarcaFil: Guebila, Marouen Ben. University of Luxembourg; LuxemburgoFil: Bauer, Eugen. University of Luxembourg; LuxemburgoFil: Zaffaroni, Gaia. University of Luxembourg; LuxemburgoFil: Bekkar, Amel. Universite de Lausanne; SuizaFil: Ashano, Efejiro. APIN Public Health Initiatives; NigeriaFil: Paladin, Lisanna. Università di Padova; ItaliaFil: Necci, Marco. Università di Padova; ItaliaFil: Moreyra, Nicolás Nahuel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Ecología, Genética y Evolución de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Ecología, Genética y Evolución de Buenos Aires; Argentin

    RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures

    Get PDF
    The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.Facultad de Ciencias ExactasInstituto de Biotecnologia y Biologia Molecula

    DisProt: intrinsic protein disorder annotation in 2020

    Get PDF
    The Database of Protein Disorder (DisProt, URL: https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the ‘dark’ proteome

    Critical assessment of protein intrinsic disorder prediction

    Get PDF
    Abstract: Intrinsically disordered proteins, defying the traditional protein structure–function paradigm, are a challenge to study experimentally. Because a large part of our knowledge rests on computational predictions, it is crucial that their accuracy is high. The Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment was established as a community-based blind test to determine the state of the art in prediction of intrinsically disordered regions and the subset of residues involved in binding. A total of 43 methods were evaluated on a dataset of 646 proteins from DisProt. The best methods use deep learning techniques and notably outperform physicochemical methods. The top disorder predictor has Fmax = 0.483 on the full dataset and Fmax = 0.792 following filtering out of bona fide structured regions. Disordered binding regions remain hard to predict, with Fmax = 0.231. Interestingly, computing times among methods can vary by up to four orders of magnitude

    Computational characterization of tandem repeat and non-globular proteins

    No full text
    The first protein structure to be determined was hemoglobin, a globe-like, water-soluble protein with enzymatic activity. Since then, protein science has been biased towards this type, termed globular. However, over the last decades accumulating experimental evidences suggested the functional importance of their counterpart, non-globular proteins (NGPs). The definition includes tandem repetitions, intrinsically disordered regions, aggregating domains and transmembrane domains. NGPs recognition and classification is essential to shed a light on the so called “dark proteome”, i.e. the large fraction that we know almost nothing about. I contributed to this goal through the development of new resources dedicated to NGPs. My main focus are tandem repeat proteins (TRPs). TRPs are characterized by a repeated sequence which folds into a modular architecture, where modules are called “units”. The unit represents not only the structural but also the evolutionary module and base TRPs classification. TRPs are widespread in all type of organisms, where they carry out fundamental functions. The sequences of TRP units diverge quickly while maintaining their fold, hampering detection by traditional methods for sequence analysis. Conversely, the challenges of structure-based repeats detection lie in the multidimensional nature of the data. Specialized methods have been developed for TRPs identification, however few of them annotate single repeat units. RepeatsDB is a database of TRP structures annotated with the position of repeat units and insertions. I contributed to the new version of RepeatsDB database, which was populated taking advantage of ReUPred, predictor of tandem repeat units. The quality of RepeatsDB data is guaranteed by manual validation, a time-consuming task which requires community annotation efforts. To facilitate this process I developed RepeatsDB-lite, web server for the prediction and refinement of tandem repeats in protein structure. Analysing RepeatsDB data, I compared the sequence- and structure-based classification of TRPs. Moreover, I provided insights on TRPs role in the human proteome by characterizing them in terms of function, protein-protein interaction networks and impact on diseases. As a case study, I characterized Collagen V, a repeat protein associated to Ehlers-Danlos syndrome, identifying genotype-phenotype correlations in relation to its interaction network model. Another category of NGPs is intrinsically disordered proteins (IDPs), devoid of order in their native state. Intrinsic disorder was shown to be prevalent in the human proteome, to play important signaling and regulatory roles and to be frequently involved in disease. I contributed to MobiDB, database of protein disorder and mobility annotations that describes several aspects of NGPs structure and mechanism of function. MobiDB provides consensus predictions and functional annotations for all known protein sequences. A common feature of TRPs, IDPs and other NGPs is that they are characterized by low-complexity regions, where the distribution of amino acids deviates from the common amino acid usage. The functional importance of low complexity regions is strictly related to their non-globular arrangement. I contributed to the field with a critical review focusing on the definition of sequence features of low complexity regions and their relationship to structural features. Finally, I exploited the knowledge acquired on NGPs in the previous studies to design one of the first sequence-based methods for the prediction of protein solubility, SODA. SODA uses the aggregation propensity, intrinsic disorder, hydrophobicity and secondary structure preferences from a sequence to evaluate solubility changes introduced by a mutation. The main envisaged applications of SODA are in protein engineering and in the study of the impact of protein mutations in disease insurgence.La prima struttura proteica ad essere stata determinata è quella dell’emoglobina, una proteina sferica e solubile ad attività enzimatica. Da allora la scienza si è concentrata su questa tipologia di proteine, definite globulari. Recenti evidenze sperimentali però suggeriscono l’importanza funzionale della loro controparte, proteine definite non globulari (NGP). Il riconoscimento e la classificazione delle NGP è essenziale per far luce sul cosiddetto dark proteome, ovvero la frazione del proteoma ancora non caratterizzata. Ho contribuito a questo scopo attraverso lo sviluppo di risorse dedicate alle NGP, principalmente alle proteine ripetute in tandem (TRP). Le TRP sono caratterizzate da una sequenza ripetuta che codifica per una struttura modulare, dove i singoli moduli sono chiamati unità. Essi rappresentano non solo la minima entità strutturale, ma anche evolutiva delle TRP: sono infatti alla base della loro classificazione. Le TRP sono diffuse in tutti i tipi di organismi, dove svolgono funzioni essenziali. Le sequenze delle unità ripetute divergono velocemente pur conservando la struttura: ciò complica il loro riconoscimento da sequenza. D’altro lato, anche l’individuazione delle ripetute sulla base della struttura è complessa a causa della multidimensionalità del dato. Metodi specifici sono stati sviluppati per l’identificazione delle TRP, ma pochi annotano le singole unità. RepeatsDB è un database di strutture ripetute che riporta la posizione di unità e inserzioni. Ho contribuito alla nuova versione del database, popolato grazie a ReUPred, predittore di unità ripetute. La qualità del dato è garantita da validazione manuale, un processo dispendioso che richiede il contributo di annotatori esperti. Per facilitarlo ho sviluppato RepeatsDB-Lite, un server online per la predizione e l’annotazione di TRP. Analizzando il dato in RepeatsDB, ho confrontato le classificazioni delle TRP sulla base della sequenza e della struttura. Inoltre, ho descritto il ruolo delle TRP nel proteoma umano presentando le loro funzioni, la loro rete di interazioni e il loro impatto sulle malattie. Come caso di studio ho caratterizzato il collagene V, una TRP associata alla sindrome di Ehlers-Danlos, identificando le correlazioni genotipo-fenotipo in relazione alle interazioni che la proteina stabilisce. Un’altra categoria di NGP è quella delle proteine intrinsecamente disordinate (IDP), prive di struttura terziaria fissa o ordinata. Il disordine è prevalente nel proteoma umano, ha un ruolo fondamentale nella segnalazione e nella regolazione cellulare ed è frequentemente associato alle malattie. Ho contribuito a MobiDB, database di disordine e mobilità proteica che descrive molti aspetti della struttura e dei meccanismi di funzionamento delle NGP. MobiDB presenta un consenso fra predizioni e annotazioni funzionali per tutte le sequenze proteiche conosciute. Una caratteristica comune di TRP, IDP e altre NGP è che sono caratterizzate da regioni a bassa complessità, cioè la distribuzione degli aminoacidi nelle loro sequenze devia dalla media. L’importanza funzionale delle regioni a bassa complessità è strettamente connessa al loro arrangiamento non globulare. Il mio contribuito al settore consiste nella definizione delle caratteristiche delle sequenze a bassa complessità in relazione alle loro caratteristiche strutturali. Infine, ho sfruttato le conoscenze acquisite sulle NGP per progettare uno dei primi predittori di solubilità basati sulla sequenza, SODA. SODA utilizza l’idrofobicità della sequenza oltre alla propensione ad aggregazione, disordine e a formare elementi di struttura secondaria per predire quanto contribuisce una data mutazione a modificare la sua solubilità. Le principali applicazioni di SODA sono nell’ambito dell’ingegneria proteica e nello studio dell’impatto delle mutazioni nell’insorgenza di malattie
    corecore